{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/DeepLakeIndexDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deep Lake Vector Store Quickstart"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Deep Lake can be installed using pip. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-vector-stores-deeplake"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index\n",
    "!pip install deeplake"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's import the required modules and set the needed environmental variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import textwrap\n",
    "\n",
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Document\n",
    "from llama_index.vector_stores.deeplake import DeepLakeVectorStore\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"sk-********************************\"\n",
    "os.environ[\"ACTIVELOOP_TOKEN\"] = \"********************************\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are going to embed and store one of Paul Graham's essays in a Deep Lake Vector Store stored locally. First, we download the data to a directory called `data/paul_graham`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "\n",
    "urllib.request.urlretrieve(\n",
    "    \"https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\",\n",
    "    \"data/paul_graham/paul_graham_essay.txt\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now create documents from the source data file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: a98b6686-e666-41a9-a0bc-b79f0d666bde Document Hash: beaa54b3e9cea641e91e6975d2207af4f4200f4b2d629725d688f272372ce5bb\n"
     ]
    }
   ],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n",
    "print(\n",
    "    \"Document ID:\",\n",
    "    documents[0].doc_id,\n",
    "    \"Document Hash:\",\n",
    "    documents[0].hash,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's create the Deep Lake Vector Store and populate it with data. We use a default tensor configuration, which creates tensors with `text (str)`, `metadata(json)`, `id (str, auto-populated)`, `embedding (float32)`. [Learn more about tensor customizability here](https://docs.activeloop.ai/example-code/getting-started/vector-store/step-4-customizing-vector-stores). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\r"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploading data to deeplake dataset.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 22/22 [00:00<00:00, 684.80it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset(path='./dataset/paul_graham', tensors=['text', 'metadata', 'embedding', 'id'])\n",
      "\n",
      "  tensor      htype      shape      dtype  compression\n",
      "  -------    -------    -------    -------  ------- \n",
      "   text       text      (22, 1)      str     None   \n",
      " metadata     json      (22, 1)      str     None   \n",
      " embedding  embedding  (22, 1536)  float32   None   \n",
      "    id        text      (22, 1)      str     None   \n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n",
      "\r"
     ]
    }
   ],
   "source": [
    "from llama_index.core import StorageContext\n",
    "\n",
    "dataset_path = \"./dataset/paul_graham\"\n",
    "\n",
    "# Create an index over the documents\n",
    "vector_store = DeepLakeVectorStore(dataset_path=dataset_path, overwrite=True)\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Performing Vector Search\n",
    "\n",
    "Deep Lake offers highly-flexible vector search and hybrid search options [discussed in detail in these tutorials](https://docs.activeloop.ai/example-code/tutorials/vector-store/vector-search-options). In this Quickstart, we show a simple example using default options. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\n",
    "    \"What did the author learn?\",\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  The author learned that working on things that are not prestigious can be a good thing, as it can\n",
      "lead to discovering something real and avoiding the wrong track. The author also learned that\n",
      "ignorance can be beneficial, as it can lead to discovering something new and unexpected. The author\n",
      "also learned the importance of working hard, even at the parts of the job they don't like, in order\n",
      "to set an example for others. The author also learned the value of unsolicited advice, as it can be\n",
      "beneficial in unexpected ways, such as when Robert Morris suggested that the author should make sure\n",
      "Y Combinator wasn't the last cool thing they did.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What was a hard moment for the author?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author experienced a hard moment when one of his programs on the IBM 1401 computer did not\n",
      "terminate. This was a social as well as a technical error, as the data center manager's expression\n",
      "made clear.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author experienced a hard moment when one of his programs on the IBM 1401 computer did not\n",
      "terminate. This was a social as well as a technical error, as the data center manager's expression\n",
      "made clear.\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What was a hard moment for the author?\")\n",
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deleting items from the database"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To find the id of a document to delete, you can query the underlying deeplake dataset directly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "./dataset/paul_graham loaded successfully.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\r"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['42f8220e-673d-4c65-884d-5a48a1a15b03']"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import deeplake\n",
    "\n",
    "ds = deeplake.load(dataset_path)\n",
    "\n",
    "idx = ds.id[0].numpy().tolist()\n",
    "idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index.delete(idx[0])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  },
  "vscode": {
   "interpreter": {
    "hash": "6e44765f40d39e4c6e3d7a9b35e5b42b8711c1c0fb3c237b84fa62e4b3e35e04"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
